Goto

Collaborating Authors

 sf 2





SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Hu, Yangliu, Song, Zikai, Feng, Na, Luo, Yawei, Yu, Junqing, Chen, Yi-Ping Phoebe, Yang, Wei

arXiv.org Artificial Intelligence

Video-based Large Language Models (Video-LLMs) have witnessed substantial advancements in recent years, propelled by the advancement in multi-modal LLMs. Although these models have demonstrated proficiency in providing the overall description of videos, they struggle with fine-grained understanding, particularly in aspects such as visual dynamics and video details inquiries. T o tackle these shortcomings, we find that fine-tuning Video-LLMs on self-supervised fragment tasks, greatly improve their fine-grained video understanding abilities. Hence we propose two key contributions: (1) Self-Supervised Fragment Fine-Tuning (SF 2 T), a novel effortless fine-tuning method, employs the rich inherent characteristics of videos for training, while unlocking more fine-grained understanding ability of Video-LLMs. Moreover, it relieves researchers from labor-intensive annotations and smartly circumvents the limitations of natural language, which often fails to capture the complex spatiotemporal variations in videos; (2) A novel benchmark dataset, namely FineVidBench, for rigorously assessing Video-LLMs' performance at both the scene and fragment levels, offering a comprehensive evaluation of their capabilities. W e assessed multiple models and validated the effectiveness of SF 2 T on them. Experimental results reveal that our approach improves their ability to capture and interpret spatiotemporal details. 1. Introduction Large Language Models (LLMs) have showcased significant emergent capabilities, such as in-context learning [19], instruction-following [23], and chain-of-thought reasoning [30], driven by expansive datasets and advanced model architectures. Performance w/ and w/o SF 2 T .We evaluated four advanced Video-LLMs w/ and w/o SF 2 T on our proposed FineV - idBench with two baselines: (1) Base: performance without any fine-tuning (blue dashed), and (2) Base (SFT): performance with supervised fine-tuning (red dashed). After applying SF 2 T, all models showed significant improvements (solid blue and red), underscoring its broad effectiveness. V arious Video-LLMs, exemplified by GPT4-V, VideoL-LaMA 2 [4], MiniCPM-V [34], and Qwen2-VL [28], have been crafted by leading corporations and research institutions, demonstrating proficiency in capturing the overar-ching content of videos.


Feature Recommendation for Structural Equation Model Discovery in Process Mining

Qafari, Mahnaz Sadat, van der Aalst, Wil

arXiv.org Artificial Intelligence

Process mining techniques can help organizations to improve their operational processes. Organizations can benefit from process mining techniques in finding and amending the root causes of performance or compliance problems. Considering the volume of the data and the number of features captured by the information system of today's companies, the task of discovering the set of features that should be considered in root cause analysis can be quite involving. In this paper, we propose a method for finding the set of (aggregated) features with a possible effect on the problem. The root cause analysis task is usually done by applying a machine learning technique to the data gathered from the information system supporting the processes. To prevent mixing up correlation and causation, which may happen because of interpreting the findings of machine learning techniques as causal, we propose a method for discovering the structural equation model of the process that can be used for root cause analysis. We have implemented the proposed method as a plugin in ProM and we have evaluated it using two real and synthetic event logs. These experiments show the validity and effectiveness of the proposed methods.


Unbalanced Sobolev Descent

Mroueh, Youssef, Rigotti, Mattia

arXiv.org Machine Learning

We introduce Unbalanced Sobolev Descent (USD), a particle descent algorithm for transporting a high dimensional source distribution to a target distribution that does not necessarily have the same mass. We define the Sobolev-Fisher discrepancy between distributions and show that it relates to advection-reaction transport equations and the Wasserstein-Fisher-Rao metric between distributions. USD transports particles along gradient flows of the witness function of the Sobolev-Fisher discrepancy (advection step) and reweighs the mass of particles with respect to this witness function (reaction step). The reaction step can be thought of as a birth-death process of the particles with rate of growth proportional to the witness function. When the Sobolev-Fisher witness function is estimated in a Reproducing Kernel Hilbert Space (RKHS), under mild assumptions we show that USD converges asymptotically (in the limit of infinite particles) to the target distribution in the Maximum Mean Discrepancy (MMD) sense. We then give two methods to estimate the Sobolev-Fisher witness with neural networks, resulting in two Neural USD algorithms. The first one implements the reaction step with mirror descent on the weights, while the second implements it through a birth-death process of particles. We show on synthetic examples that USD transports distributions with or without conservation of mass faster than previous particle descent algorithms, and finally demonstrate its use for molecular biology analyses where our method is naturally suited to match developmental stages of populations of differentiating cells based on their single-cell RNA sequencing profile. Code is available at https://github.com/ibm/usd .


Storage Fit Learning with Feature Evolvable Streams

Hou, Bo-Jian, Yan, Yu-Hu, Zhao, Peng, Zhou, Zhi-Hua

arXiv.org Machine Learning

Feature evolvable learning has been widely studied in recent years where old features will vanish and new features will emerge when learning with streams. Conventional methods usually assume that a label will be revealed after prediction at each time step. However, in practice, this assumption may not hold whereas no label will be given at most time steps. To tackle this problem, we leverage the technique of manifold regularization to utilize the previous similar data to assist the refinement of the online model. Nevertheless, this approach needs to store all previous data which is impossible in learning with streams that arrive sequentially in large volume. Thus we need a buffer to store part of them. Considering that different devices may have different storage budgets, the learning approaches should be flexible subject to the storage budget limit. In this paper, we propose a new setting: Storage-Fit Feature-Evolvable streaming Learning (SF2EL) which incorporates the issue of rarely-provided labels into feature evolution. Our framework is able to fit its behavior to different storage budgets when learning with feature evolvable streams with unlabeled data. Besides, both theoretical and empirical results validate that our approach can preserve the merit of the original feature evolvable learning i.e., can always track the best baseline and thus perform well at any time step.


Discontinuity-Free Decision Support with Quantitative Argumentation Debates

Rago, Antonio (Imperial College London) | Toni, Francesca (Imperial College London) | Aurisicchio, Marco (Imperial College London) | Baroni, Pietro (Università degli Studi di Brescia)

AAAI Conferences

IBIS (Issue Based Information System) provides a widely adopted approach for knowledge representation especially suitable for the challenging task of representing wicked decision problems. While many tools for visualisation and collaborative development of IBIS graphs are available, automated decision support in this context is still underdeveloped, even though it would benefit several applications. QuAD (Quantitative Argumentation Debate) frameworks are a recently proposed IBIS-based formalism encompassing automated decision support by means of an algorithm for quantifying the strength of alternative decision options, based on aggregation of the strength of their attacking and supporting arguments. The initially proposed aggregation method, however, may give rise to discontinuities. In this paper we propose a novel, discontinuity-free algorithm for computing the strength of decision options in QuAD frameworks. We prove that this algorithm features several desirable properties and we compare the two aggregation methods, showing that both may be appropriate in the context of different application scenarios.